Our data

Our data consist of soil certified reference materials with 0, 100, 1000, 10000, 1000000 ppm of lead, Antimony, Tungsten, and Zinc (contact Ms. Ashley Mossell at Ashley.M.Mossell@usace.army.mil, Ms. Holly Vermeulen at Holly.L.Haber@usace.army.mil, or Dr. Jay Clausen at Jay.L.Clausen@usace.army.mil for more information regarding the certified reference materials). This report will however focus only on the lead sample. 96 soil samples have also been examined.We intent to use the LIBS technique to determine the total contents of the heavy metals in the the reference soils. . In order to validate the technique, LIBS data will be compared with data obtained on the same soil samples by application of conventional Inductively Coupled Plasma ICP spectroscopy.
Firs we illustrate the form of the raw data containing the soil certified reference materials for lead at 100 ppm in the table below.

In the table above, the first column represent the wavelength value while the column 2 through 10 represent the 9 different LIBS intensity shots for the samples. it can be observed from the table that certain intensity values were negative. We hence assumed every negative value to be equal to zero and then calculate the mean of the nine shots to obtain the transform data presented below for lead at the 100 ppm.

Similar methodologies were employed for the lead data at 0, 1000, 10000, and 100000 ppm respectively. A table containing all the mean lead intensities can be found in the table below

Basic analysis of the data

Using linear equation to fit the intensity and concentration data

Using a linear equation in the form \(I(\lambda) = mC(\lambda)+ b\), where \(I(\lambda)\) and \(C(\lambda)\) are respectively the mean intensity and the theoretical concentration (i.e. 0, 100, 1000, 10000, and 100000 ppm) at each of the wavelength \(\lambda_i\), we estimate the slope \(m\) and \(R^2\) values in order to determine the correct calibration curve. the code below were used in R to perform the calculation.

L <- length(Lead_Data[,1])# length of the intensity data

# create an empty vector for the slope, the intercept and Rsquare values
slope <- numeric(0L)
intercept <- numeric(0L)
Rsquare <- numeric(0L)

# Loop through the entire data and create a linear model where the mean intensity is the dependent variable and the theoretical concentration is the independent variable. 
for (i in 1:L){
  linear_relation <- lm(as.numeric(Lead_Data[i,2:6])~Concentration)
  sum_relation <- summary(linear_relation)
  
  #extract coefficients from linear regression
  slope <- c(slope,linear_relation$coefficients[2])
  intercept <- c(intercept,linear_relation$coefficients[1])
  Rsquare <- slope <- c(Rsquare,sum_relation$r.squared)
}

Estimating the concentration for a reference soil using the linear model

We continue our analysis using the inverse linear model from the previous section to estimate the concentration. This can be done by doing \(C(\lambda_i) = \frac{I(\lambda_i) - b(\lambda_i)}{m(\lambda_i)}\). Moreover, we calculate the average of the nine shots of the reference soils by assuming any intensity value less than 0 is equal to 0. Then, the intensity of these mean reference soil were utilized to estimate the concentration at each wavelength \(\lambda\). The following R code were used to conduct these analysis over the entire spectrum.

Wavelength <- Lead_Data[,1] # extract the wavelength from our data table
Soil_B1 <- CRREL_Soil$B1 # reference soil 

# Calculate concentration base on wavelength
Estimated_Concentration <- numeric(L) # empty vector of concentration

for (j in 1:L){
  Estimated_Concentration[j] <- (Soil_B1[j] - intercept[j])/slope[j] 
}

# data containing the wavelength, slop, intercept, R^2, soil B1, and estimate concentration
Coeff <- data.frame(Wavelength,slope,intercept,Rsquare,Soil_B1,
                    Estimated_Concentration)

datatable(Coeff)

Now a visual illustration of the \(R^2\) value with respect to the wavelength shows how the linear accuracy is changing with respect to wavelength.

######################################################
black.bold.text1 <- element_text(face = "bold", color = "black",size=26) # x and y axis
black.bold.text2 <- element_text(face = "bold", color = "black",size=22) # title

p<-theme(axis.text.x = element_text(face="bold", color="black", size=22),
         axis.text.y = element_text(face="bold", color="black", size=22),
         title=black.bold.text2,axis.title = black.bold.text1, legend.position = "bottom",
         legend.text = element_text(size=24),strip.text.x = element_text(face="bold",size=22))
##############################################################


p1 <- ggplot(data = Coeff, aes(x = Wavelength, y = Rsquare)) +
  geom_line() + xlab(" Wavelength ") + ylab(" Rsquare values") +
  ggtitle("Wavelength vs. Rsquare") + p

p2 <- ggplot(data = Coeff, aes(x = Wavelength, y = Estimated_Concentration)) +
  geom_line() + xlab(" Wavelength") + ylab(" Estimated Concentration") +
  ggtitle("Wavelength vs. estimated concentration for soil B1") + p

plot_grid(p1, p2, labels = "AUTO")

We could noticed that several of the estimated concentration were negative, indicating that these are obviously not the appropriate wavelength for lead. However, as stated by De Lucia C. F. (2011), a typical LIBS spectrum is made up of multiple emission lines primarly due to atomic species, thus we expect that most elements will have multiple emission lines. We are hence faced with the following questions :

  • How can one accurately handle LIBS intensity values that are negative as in our case?
    • Is our procedure of assuming all negative values are zero and taking an average of all the nine shots correct?
    • What does negative intensity value imply?
  • Is it alright to estimate the concentration at each wavelength or is it better to use the equation with the highest \(R^2\) value to estimate concentration for all soils?
  • When mutltiple emission lines have a high \(R^2\), what wavelength should be used to accurately estimate concentration for other reference soils